Introudction
Context: Understanding patterns in crime rates is a topic of major interest for policymakers and law enforcement. Whereas research historically focused on socio-demographics such as age, gender, or socio-economic status, modern approaches also consider contextual factors to criminal behavior. More recently, weather parameters such as temperature, precipitation, or wind speed were found to play a pivotal role in understanding crimes. Crime rates of major US cities and regions correlate significantly with temperature. This project investigates the weather impact on Boston crime rates by applying correlation analysis, significance tests, and linear regression.
Research Question: This project investigates the extent to which weather parameters can explain variation in daily crimes in Boston in 2021. Furthermore, weekly, monthly, and annual patterns are researched.
Data & Processing Steps: Data from two sources are combined, standardized (z-score), and detrended before statistical testing. First, to calculate daily crime rates, the API of the Boston Police Department is called for the timespan between January 1st and December 31st, 2021. The time-series data is then joined with weather data requested through the national oceanic and atmospheric administration NOAA. The final dataset consists of 365 entries containing daily records of crime rates, average temperature, rain- and snowfall, and wind speed. The two additional data fields ‘day-of-week’ and ‘day-of-month’, are added to investigate weekly or monthly patterns. The data retrieved from the Boston Police Department further includes the category of registered incidents and their geo-location. Several incident types are removed from the dataset as they are not criminal offenses (e.g., Medical Assistance).
References:
- Crime Incident Reports BPD https://data.boston.gov/
- NCDC NOAA Climate Data https://www.ncdc.noaa.gov/
Which were the Top 20 Crimes in 2021?
Where were these criminal offenses reported?
When analyzing the Geo-Locations of the incidents in 2021, it shows that more cases were registered in denser areas in the city center.
How did the case numbers evolve throughout 2021?
The time series shows a typical cyclic pattern found in other cities and regions. Crimes tend to be higher during the summer months and decrease towards winter. Sudden Drops can be seen in December around Christmas.
Is there a monthly patterns?
The crime rates do not indicate a clear monthly pattern. However, crimes during the first day of the month appear to be substantially higher than the average.
Is there a weekly pattern?
The box plots indicate that crimes may happen more frequently on Fridays and less frequently on Sundays. The rate appears to behave stable throughout the workweek.
Could the weather have any influence on the number of crimes happening in Boston?
The two parameters seem to show a similar pattern when considering crime rates and average daily temperature.
How are weather parameters and crimes distributed?
Let’s take a closer look at the relationship between Crimes and daily average Temperature (TAVG). Is the relationship significant (alpha = 0.05)?
Inference: The small p-value suggests that the correlation between average daily temperature and crimes is significant on an alpha level of 0.05.
##
## Pearson's product-moment correlation
##
## data: df_norm0$TAVG and df_norm0$CRIMES
## t = 13.312, df = 363, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4994577 0.6378913
## sample estimates:
## cor
## 0.5727439
Do crimes happen more often on particular weekdays? (Global F-Test)
Inference: F-Critical (2.123923) < F-value (6.809) -> reject H0 -> there are significant differences among the weekdays
## Df Sum Sq Mean Sq F value Pr(>F)
## aov_df$weekdays 6 35.58 5.929 6.809 7.48e-07 ***
## Residuals 358 311.74 0.871
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "F-critical: 2.12"
Which weekdays are significantly different from others? (Pairwise t-test)
Inference: Friday-Thursday, Friday-Saturday, Sunday-Saturday, and Sunday-Monday appear to be the succeeding weekday-pairs with the greatest difference.
##
## Pairwise comparisons using t tests with pooled SD
##
## data: aov_df$CRIMES and aov_df$weekdays
##
## Friday Monday Saturday Sunday Thursday Tuesday
## Monday 0.22330 - - - - -
## Saturday 0.02101 0.27513 - - - -
## Sunday 6.6e-09 3.7e-06 0.00035 - - -
## Thursday 0.04307 0.42048 0.77467 0.00012 - -
## Tuesday 0.01427 0.21700 0.88580 0.00059 0.66730 -
## Wednesday 0.15440 0.83669 0.37582 9.4e-06 0.54873 0.30349
##
## P value adjustment method: none
Which factors help to explain variablility in crime rates?
Inference: The model explains about 40% of the variation in daily crimes. The binary label ‘isSunday’ is the greatest negative estimator, whereas temperature (TAVG) is the strongest positive estimator. Wind and Snow do not seem to be contributing significant contributors and shall be removed from the model.
##
## Call:
## lm(formula = CRIMES ~ ., data = reg_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3386 -0.4526 0.0068 0.4690 2.4385
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05295 0.04663 1.136 0.256908
## isSunday -0.74789 0.11434 -6.541 2.11e-10 ***
## isFriday 0.36915 0.11347 3.253 0.001250 **
## AWND 0.04918 0.04047 1.215 0.225100
## TAVG 0.59383 0.04290 13.843 < 2e-16 ***
## PRCP -0.15093 0.04046 -3.730 0.000222 ***
## SNOW 0.04313 0.04009 1.076 0.282694
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7512 on 358 degrees of freedom
## Multiple R-squared: 0.4183, Adjusted R-squared: 0.4086
## F-statistic: 42.91 on 6 and 358 DF, p-value: < 2.2e-16
Applying backward feature elimination to model parameter to find best estimator subset
Inference: The AIC improved from 201.91 to 203.39 after removing both parameters Wind and Snow. Further the F-statistic increased substantially, whereas the Adj. R^2 only experienced a minor decrease.
## Start: AIC=-201.91
## CRIMES ~ isSunday + isFriday + AWND + TAVG + PRCP + SNOW
##
## Df Sum of Sq RSS AIC
## - SNOW 1 0.653 202.67 -202.731
## - AWND 1 0.833 202.85 -202.406
## <none> 202.02 -201.909
## - isFriday 1 5.972 207.99 -193.275
## - PRCP 1 7.851 209.87 -189.992
## - isSunday 1 24.143 226.16 -162.704
## - TAVG 1 108.129 310.15 -47.438
##
## Step: AIC=-202.73
## CRIMES ~ isSunday + isFriday + AWND + TAVG + PRCP
##
## Df Sum of Sq RSS AIC
## - AWND 1 0.743 203.42 -203.395
## <none> 202.67 -202.731
## - isFriday 1 6.128 208.80 -193.857
## - PRCP 1 7.481 210.15 -191.500
## - isSunday 1 23.793 226.47 -164.216
## - TAVG 1 108.205 310.88 -48.581
##
## Step: AIC=-203.39
## CRIMES ~ isSunday + isFriday + TAVG + PRCP
##
## Df Sum of Sq RSS AIC
## <none> 203.42 -203.395
## - isFriday 1 6.356 209.77 -194.164
## - PRCP 1 6.888 210.30 -193.240
## - isSunday 1 23.641 227.06 -165.263
## - TAVG 1 107.915 311.33 -50.049
##
## Call:
## lm(formula = CRIMES ~ isSunday + isFriday + TAVG + PRCP, data = reg_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3474 -0.4445 -0.0085 0.4662 2.4662
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05008 0.04662 1.074 0.283484
## isSunday -0.73891 0.11423 -6.468 3.24e-10 ***
## isFriday 0.38010 0.11333 3.354 0.000881 ***
## TAVG 0.57936 0.04192 13.820 < 2e-16 ***
## PRCP -0.13847 0.03966 -3.491 0.000540 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7517 on 360 degrees of freedom
## Multiple R-squared: 0.4143, Adjusted R-squared: 0.4078
## F-statistic: 63.67 on 4 and 360 DF, p-value: < 2.2e-16
Conclusion
The statistical analysis of weather and crime data has shown a significant association between Boston’s average daily temperatures, precipitation, and crimes rate. Moreover, a weekly pattern in crimes was found, indicating that more crimes are happening on Friday night, whereas fewer cases are reported on Sundays. The linear regression model, including the parameters named above, explained about 40% of the variability in crimes. Future research may investigate whether these associations are more substantial in particular areas within Boston.